Can I Trust LLM Output & What is an LLM Actually Useful For?

My interest in large language models (LLMs) is best thought of as recreational, though they touch on my primary fascination: artificial neural networks and the nature of information.

Even so, from time to time I am told that my perspectives offer a helpful or interesting way to think about a topic, and in this case it is on LLMs . This article is an attempt to capture multiple discussions in which I have shared one such perspective: a way of understanding what an LLM actually is, how it works, and when we might trust its output, discussed in an intuitive and approachable way.

What is an LLM, and How Does It Work?

At its core, a large language model (LLM) is trained to predict the next word fragment, typically called a "token" by folks working on natural language processing, or more generally known as an n-gram, for a vast array of human language data. It is given immense amounts of text, everything from scientific papers to religious texts to social media posts and is evaluated on how well it can choose the most probable n-gram to add to text it has been given.

This task might sound simple, but it requires the LLM to model language in ways that extend well beyond pure grammar. To accurately predict human language, the LLM has to account for the context behind that language. That means picking up on intentions, common knowledge, cultural biases, and countless other nuances that make human communication so rich and varied. Take the homograph "lead" on it's own what does that word mean? Take the phrase

"Well, we could use the lead to lead the dog to the river, so it doesn't drink from the well poisned with lead"

In this scentence the word lead is pronounced two different ways and has three distinct meanings. This is not an easy thing to learn or even teach, and is far from being the only example of such things in the english language, let alone other languages. In essence, the LLM is tasked with modelling the human mind in the many forms it takes, not the factual world that humans inhabit. It is tasked with learning to model the thing gerating the language, where the language itself is a complex and unreliable model of the physical world we inhabit. I have come to conclude that language only works because we can draw on common expriences and concepts to which we assign words, where each of us associates a slightly different cluster of concepts and ideas to specific words, so not only are words difficult to learn to understand, we all understand the same words slightly differently. So, without these common experiences what hope should something have in making any sense of language?

Rather than understanding the physical world directly, an LLM learns patterns in human thought as encoded in the specific word choice of a mind that has generated text on a given topic. The world an LLM builds inside itself is a world that is inherently human, a world of perspectives, emotions, and assumptions that reflect collective human psychology more than any aspect of our objective shared reality.

The LLM's "knowledge" of any factual topic, therefore, is at best as reliable as humanities' collective understanding of that topic, biases and all. Worse still, the LLM's understanding of a topic such that we understand "understanding" a topic is incidental, it learns to model what it needs to, so that it knows what is required to model the kind of mind that has been exposed to that kind of information. This means information is learned not with a preference for fact, but rather with a preference for how it contributes to correctly predicting future words, which itself is more often than not factually correct, but too frequently departs from accepted truths in ways only those who know the truth can tell.

Further still, since this text generating engine has a really good comprehension of what someone might say in a given context, regardless of what is known or not, it can easily produce untrue things that seem like someone who knows what they are talking about would have said. The problem is then that unless you already know the truth, you cannot tell when what was said was real, or simply sounded real. This brings the utility of LLM's into question, since if I can only trust it when it tells me things I already know, what is the point?

Trusting LLM Output: Why Human Minds are a Poor Proxy for Reality

If LLMs are mirrors to the human psyche, they are reflections, not windows. They capture not the world as it is, but the world as human minds describe, debate, and distort it. The implications of this are significant when it comes to trust.

Humans often treat language as an expression of reality, but the truth is, human minds are not perfect representations of the real world they're prone to misunderstandings, biases, and groupthink. As a result, LLMs inherit these distortions because they mirror the language patterns and concepts humans have developed, not the objective facts of the world itself.

A Few Examples of Misalignment Between Human Minds and Reality

Common Misconceptions: Even highly educated people hold certain myths or misconceptions, especially in complex fields like psychology, health, or history. For instance, Ignaz Semmelweis, a Hungarian physician in the 19th century, is often remembered in modern times as a pioneer in antiseptic procedures. Yet his groundbreaking work in promoting hand hygiene to reduce maternal mortality was initially dismissed by the medical community. Despite strong evidence that handwashing could prevent infections, his ideas were met with resistance, and he faced personal and professional ridicule. It wasn't until years later, after his death, that Semmelweis's findings were widely accepted, thanks to the work of other scientists like Louis Pasteur and Joseph Lister. People rejected the truth and the evidence semmelweis brought before them, because of entrenched beliefs, professional pride, and social dynamics, non of which are real beyond how real we make them through our collective beleif in them.
Cultural and Group Biases: Every culture and group has narratives that are more or less aligned with reality. These can include collective biases about historical events or ingrained stereotypes. An LLM, trained on diverse human sources, may reflect these biases without discerning their truthfulness, where the probability of encountering a given bias is relative to how frequently it is expressed in the training text.
Subjective Truths: In some areas like ethics, politics, and even science humans don't agree on one objective reality. Language here often reflects diverse viewpoints rather than factual consensus. This article is itself an example of that, there will be many who do, and rightfully should, disagree. I am making a lot of claims here with no real evidence beyond intuition to back any of it up. An LLM captures these subjective truths as well, but when repeating information will not always repeat the full breadth of concurrently held 'truths', rather presenting a narrow scope in a highly assertive way.

These examples aim to illustrate why LLMs, while impressive at imitating human language, are not necessarily accurate sources of factual information. The core of the problem is that LLM's represent the most common sentiments, which can be deeply untrue, and can generate authoritative sounding arguments supporting positons that are not supported by avaliable evidence. This is becauase an LLM's "knowledge" of a topic doesn't mean it has an understanding of it as a scientist or historian might; rather, it knows how the topic might be discussed by people, and how the topic and knowledge of the topic will influence the words a person might choose when expressing themselves.

LLMs as Tools for Understanding Humans

Despite these limitations, LLMs are remarkably valuable in one particular way: they offer insights into the human psyche. An LLM is, in essence, a mirror to human minds. This makes it uniquely useful for tasks that benefit from understanding how people think, feel, and communicate, though it is important to be aware that a LLM learns a restricted distribution of the human mind, meaning that to some extent it will only give insight into minds that are near the average of the training corpus, minds that deviate too far will be poorly represented. We are ill equipped to comprehend this distribution, so speculating on a mind's proximity to the training corpus mean is best left alone, other than to say AI detecting algorithms are bullshit, if all they do is analyse text where no special characters have been used to impart hidden watermarks.

Bridging Communication Gaps

One of the most powerful applications of LLMs is in helping bridge gaps between people who might otherwise struggle to understand each other. By modelling the language patterns of many kinds of people, an LLM can help clarify questions, ideas, or perspectives in ways that resonate with different backgrounds. It can take a vague question, shape it into something coherent, and offer phrasing that both a layperson and an expert might understand.

Enhancing Self-Reflection and Empathy

Because an LLM reflects a staggering range of human minds, and it aims to create a dialogue that keeps going, it can feel like interacting with a someone who really understands you on a deep level, and in some way this is true though not in the being understood by another human way. More like in an being understood by an advertising algorithm kind of way. However, by using an LLM to frame responses from multiple angles, it can offer new ways of looking at familiar issues, shedding light on why people think or why they speak as they do. This can be a powerful tool for empathy, allowing users to see their own questions or concerns reframed in ways that reveal the underlying assumptions they might not have considered, or help reframe communications which appear rude but have respectful connotations when considering the senders background.

Not a Knowledge Base, But a Psychological Lens

In short, an LLM is less a source of factual information and more a tool for comprehension of humans and human content. It does not "know" physics, history, or medicine in the way a textbook or human might; instead, it reflects the language people use to talk about those subjects. This is valuable not because it offers definitive answers but because it mirrors how humans engage with those topics, often revealing the assumptions, biases, and variations in understanding that shape human perspectives.

Final Thoughts: The LLM's Strength and Limitations

LLMs are powerful tools, but understanding their limits is crucial. They are intrinsically and deeply human, but the "reality" they offer is a reality shaped by human perception, not one rooted in objective truth. An LLM may generate language that sounds authoritative, but it should be approached as a conversational partner rather than an infallible source.

Ultimately, what an LLM shows us isn't the world itself, but an algorithms imagining of humanities imagining of the world. It is a distorted reflection of our collective psyche, and therefore a even more heavily distorted reflection of our world, meaning that what it tells us is typically more relevant to human minds than it is any other given topic. While LLMs may not be reliable sources for hard facts, they are exceptional mirrors to human thought tools that can enrich our understanding of each other and help bridge the gaps in our shared language and knowledge.

We can harness this comprehension of human minds to empower search, improve communication, or help understand one another, but to use them for content generation or in liu of searching factually grounded sources of information is a terrible abuse of the tool, one which users will grow to regret with time.

LLMs may not offer the real world as it is, but they provide something just as rare and perhaps as valuable: a portrait of the human mind in all its complexity, limitations, and beauty.